Overview

Dataset info

Number of variables16
Number of observations89914
Missing cells25333 (1.8%)
Duplicate rows0 (0.0%)
Total size in memory11.0 MiB
Average record size in memory128.0 B

Variables types

Numeric4
Categorical5
Boolean3
Date0
URL0
Text (Unique)0
Rejected4
Unsupported0

Warnings

chemical_name has a high cardinality: 310 distinct values Warning
city has a high cardinality: 379 distinct values Warning
county has a high cardinality: 92 distinct values Warning
facility_name has a high cardinality: 1521 distinct values Warning
region has constant value "5" Rejected
release_estimate_amount is highly skewed (γ1 = 45.90390638) Skewed
release_estimate_amount has 31631 (35.2%) zeros Zeros
reporting_year is highly correlated with doc_ctrl_num (ρ = 0.9999999865) Rejected
state has constant value "IL" Rejected
street_address has a high cardinality: 1632 distinct values Warning
total_release is highly correlated with release_estimate_amount (ρ = 1) Rejected

Variables

carcinogen_chem_ind
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
N
66737
Y
23177
ValueCountFrequency (%) 
N 66737 74.2%
 
Y 23177 25.8%
 

chem_ind_3350
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
N
62882
Y
27032
ValueCountFrequency (%) 
N 62882 69.9%
 
Y 27032 30.1%
 

chemical_name
Categorical

Distinct count310
Unique (%)0.3%
Missing (%)0.1%
Missing (n)47
LEAD
 
5164
LEAD COMPOUNDS
 
4258
ZINC COMPOUNDS
 
3852
Other values (306)
76593
ValueCountFrequency (%) 
LEAD 5164 5.7%
 
LEAD COMPOUNDS 4258 4.7%
 
ZINC COMPOUNDS 3852 4.3%
 
TOLUENE 3211 3.6%
 
COPPER 3133 3.5%
 
NICKEL 2959 3.3%
 
MANGANESE 2949 3.3%
 
CERTAIN GLYCOL ETHERS 2821 3.1%
 
XYLENE (MIXED ISOMERS) 2799 3.1%
 
CHROMIUM 2646 2.9%
 
Other values (299) 56075 62.4%
 
Max length69
Mean length15.23538047
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

city
Categorical

Distinct count379
Unique (%)0.4%
Missing (%)0.0%
Missing (n)0
CHICAGO
 
7597
SAUGET
 
3297
ELK GROVE VILLAGE
 
2681
Other values (376)
76339
ValueCountFrequency (%) 
CHICAGO 7597 8.4%
 
SAUGET 3297 3.7%
 
ELK GROVE VILLAGE 2681 3.0%
 
DECATUR 2215 2.5%
 
CHANNAHON 2205 2.5%
 
GRANITE CITY 1857 2.1%
 
ROCKFORD 1806 2.0%
 
LEMONT 1364 1.5%
 
FRANKLIN PARK 1339 1.5%
 
BEDFORD PARK 1263 1.4%
 
Other values (369) 64290 71.5%
 
Max length20
Mean length8.601830638
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

clean_air_act_chem_ind
Boolean

Distinct count2
Unique (%)< 0.1%
Missing (%)0.0%
Missing (n)0
Y
62907
N
27007
ValueCountFrequency (%) 
Y 62907 70.0%
 
N 27007 30.0%
 

county
Categorical

Distinct count92
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
COOK
26266
WILL
 
6549
MADISON
 
4969
Other values (89)
52130
ValueCountFrequency (%) 
COOK 26266 29.2%
 
WILL 6549 7.3%
 
MADISON 4969 5.5%
 
ST CLAIR 4523 5.0%
 
DUPAGE 3887 4.3%
 
WINNEBAGO 2828 3.1%
 
LAKE 2715 3.0%
 
KANE 2596 2.9%
 
MACON 2219 2.5%
 
ROCK ISLAND 2172 2.4%
 
Other values (82) 31190 34.7%
 
Max length11
Mean length5.861523233
Min length3
Contains charsTrue
Contains digitsFalse
Contains spacesTrue
Contains non-wordsTrue

county_code
Numeric

Distinct count92
Unique (%)0.1%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean17095.16881
Minimum17001
Maximum17203
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum17001
5-th percentile17031
Q117031
Median17091
Q317157
95-th percentile17197
Maximum17203
Range202
Interquartile range126

Descriptive statistics

Standard deviation63.0064935
Coef of variation0.00368563155
Kurtosis-1.336177609
Mean17095.16881
MAD55.66664327
Skewness0.3205553567
Sum1537095008
Variance3969.818223
Memory size702.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[17001. 17002. 17004. 17006. 17009. ... 17196. 17198. 17200. 17202. 17203.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
17031 26266 29.2%
 
17197 6549 7.3%
 
17119 4969 5.5%
 
17163 4523 5.0%
 
17043 3887 4.3%
 
17201 2828 3.1%
 
17097 2715 3.0%
 
17089 2596 2.9%
 
17115 2219 2.5%
 
17161 2172 2.4%
 
Other values (82) 31190 34.7%
 

Minimum 5 values

ValueCountFrequency (%) 
17001 892 1.0%
 
17003 40 < 0.1%
 
17005 87 0.1%
 
17007 628 0.7%
 
17011 117 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
17203 445 0.5%
 
17201 2828 3.1%
 
17199 633 0.7%
 
17197 6549 7.3%
 
17195 530 0.6%
 

doc_ctrl_num
Numeric

Distinct count51250
Unique (%)57.0%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean1.311596979e+12
Minimum1.305203139e+12
Maximum1.318217591e+12
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum1.305203139e+12
5-th percentile1.305203856e+12
Q11.308206511e+12
Median1.311209845e+12
Q31.315213936e+12
95-th percentile1.318216984e+12
Maximum1.318217591e+12
Range1.30144514e+10
Interquartile range7007425101

Descriptive statistics

Standard deviation4046086033
Coef of variation0.003084854645
Kurtosis-1.214671563
Mean1.311596979e+12
MAD3512885753
Skewness0.02850956773
Sum1.179309308e+17
Variance1.637081219e+19
Memory size702.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[1.30520314e+12 1.30520315e+12 1.30520316e+12 1.30520318e+12 1.30520318e+12 ... 1.31821757e+12 1.31821759e+12 1.31821759e+12 1.31821759e+12 1.31821759e+12], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1.306204299e+12 2 < 0.1%
 
1.315213608e+12 2 < 0.1%
 
1.315213783e+12 2 < 0.1%
 
1.315214378e+12 2 < 0.1%
 
1.315214339e+12 2 < 0.1%
 
1.315214333e+12 2 < 0.1%
 
1.31521431e+12 2 < 0.1%
 
1.315214202e+12 2 < 0.1%
 
1.315214299e+12 2 < 0.1%
 
1.315214278e+12 2 < 0.1%
 
Other values (51240) 89894 > 99.9%
 

Minimum 5 values

ValueCountFrequency (%) 
1.305203139e+12 1 < 0.1%
 
1.30520314e+12 2 < 0.1%
 
1.30520314e+12 2 < 0.1%
 
1.30520314e+12 2 < 0.1%
 
1.30520314e+12 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
1.318217591e+12 2 < 0.1%
 
1.318217591e+12 1 < 0.1%
 
1.318217587e+12 1 < 0.1%
 
1.318217587e+12 2 < 0.1%
 
1.318217587e+12 2 < 0.1%
 

facility_name
Categorical

Distinct count1521
Unique (%)1.7%
Missing (%)0.0%
Missing (n)0
VEOLIA ES TECHNICAL SOLUTIONS LLC
 
2172
SHERWIN-WILLIAMS CO
 
1277
ADM DECATUR COMPLEX
 
1205
Other values (1518)
85260
ValueCountFrequency (%) 
VEOLIA ES TECHNICAL SOLUTIONS LLC 2172 2.4%
 
SHERWIN-WILLIAMS CO 1277 1.4%
 
ADM DECATUR COMPLEX 1205 1.3%
 
WOOD RIVER REFINERY 1099 1.2%
 
CITGO PETROLEUM CORP LEMONT REFINERY 890 1.0%
 
EXXONMOBIL OIL CORP JOLIET REFINERY 873 1.0%
 
3M CO - CORDOVA 869 1.0%
 
MARATHON PETROLEUM CO LP ILLINOIS REFINING DIV 863 1.0%
 
US STEEL GRANITE CITY WORKS 788 0.9%
 
ROHM & HAAS CHEMICALS LLC 593 0.7%
 
Other values (1511) 79285 88.2%
 
Max length62
Mean length24.18462086
Min length3
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

region
Constant

This variable is constant and should be ignored for analysis

Constant value5

release_estimate_amount
Numeric

Distinct count15101
Unique (%)16.8%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean4919.633186
Minimum0
Maximum5680419
Zeros (%)35.2%
Mini histogram

Quantile statistics

Minimum0
5-th percentile0
Q10
Median5
Q3262
95-th percentile11682.75
Maximum5680419
Range5680419
Interquartile range262

Descriptive statistics

Standard deviation57924.88773
Coef of variation11.774229
Kurtosis2892.405466
Mean4919.633186
MAD8539.290659
Skewness45.90390638
Sum442343898.3
Variance3355292618
Memory size702.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-08 1.050000e-05 5.065000e-04 9.905000e-04 ... 3.302240e+05 5.718280e+05 9.556535e+05 3.780950e+06 5.680419e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 31631 35.2%
 
5 4170 4.6%
 
1 2387 2.7%
 
250 1722 1.9%
 
2 1048 1.2%
 
3 676 0.8%
 
10 545 0.6%
 
4 527 0.6%
 
6 419 0.5%
 
0.1 408 0.5%
 
Other values (15091) 46381 51.6%
 

Minimum 5 values

ValueCountFrequency (%) 
0 31631 35.2%
 
1e-07 1 < 0.1%
 
3e-07 1 < 0.1%
 
4e-07 1 < 0.1%
 
6e-07 1 < 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
5680419 1 < 0.1%
 
3845400 1 < 0.1%
 
3716500 1 < 0.1%
 
3623400 1 < 0.1%
 
3489000 1 < 0.1%
 

reporting_year
Highly correlated

This variable is highly correlated with doc_ctrl_num and should be ignored for analysis

Correlation0.9999999865

state
Constant

This variable is constant and should be ignored for analysis

Constant valueIL

street_address
Categorical

Distinct count1632
Unique (%)1.8%
Missing (%)0.0%
Missing (n)0
7 MOBILE AVE
 
2172
4666 FARIES PKWY E
 
1205
900 S CENTRAL AVE
 
1099
Other values (1629)
85438
ValueCountFrequency (%) 
7 MOBILE AVE 2172 2.4%
 
4666 FARIES PKWY E 1205 1.3%
 
900 S CENTRAL AVE 1099 1.2%
 
135TH ST & NEW AVE 890 1.0%
 
25915 S FRONTAGE RD 873 1.0%
 
22614 RT 84 N 869 1.0%
 
100 MARATHON AVE 863 1.0%
 
1951 STATE ST 788 0.9%
 
99 E COTTAGE AVE 502 0.6%
 
10901 BALDWIN RD 492 0.5%
 
Other values (1622) 80161 89.2%
 
Max length36
Mean length16.34456258
Min length5
Contains charsTrue
Contains digitsTrue
Contains spacesTrue
Contains non-wordsTrue

total_release
Highly correlated

This variable is highly correlated with release_estimate_amount and should be ignored for analysis

Correlation1

zip
Numeric

Distinct count487
Unique (%)0.5%
Missing (%)0.0%
Missing (n)0
Infinite (%)0.0%
Infinite (n)0
Mean15798153.46
Minimum60002
Maximum626502999
Zeros (%)0.0%
Mini histogram

Quantile statistics

Minimum60002
5-th percentile60031
Q160411
Median60827
Q362002
95-th percentile62832
Maximum626502999
Range626442997
Interquartile range1591

Descriptive statistics

Standard deviation97272694.47
Coef of variation6.15721924
Kurtosis34.2653051
Mean15798153.46
MAD30671164.14
Skewness6.021109623
Sum1.420475171e+12
Variance9.461977089e+15
Memory size702.5 KiB
Histogram
Histogram with fixed size bins (bins=50)
Histogram
Histogram with variable size bins (bins=[6.00020000e+04 6.00035000e+04 6.00060000e+04 6.00075000e+04 6.00100000e+04 ... 6.18714912e+08 6.21330664e+08 6.23663432e+08 6.25884332e+08 6.26502999e+08], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
62201 3071 3.4%
 
60007 2670 3.0%
 
60410 2201 2.4%
 
62040 1857 2.1%
 
60439 1364 1.5%
 
60131 1301 1.4%
 
60638 1274 1.4%
 
60450 1245 1.4%
 
625265666 1205 1.3%
 
60901 1180 1.3%
 
Other values (477) 72546 80.7%
 

Minimum 5 values

ValueCountFrequency (%) 
60002 73 0.1%
 
60005 741 0.8%
 
60007 2670 3.0%
 
60008 184 0.2%
 
60012 82 0.1%
 

Maximum 5 values

ValueCountFrequency (%) 
626502999 40 < 0.1%
 
625265666 1205 1.3%
 
622061198 64 0.1%
 
620600129 2 < 0.1%
 
619201182 2 < 0.1%
 

Correlations

Missing values

Sample

First rows

carcinogen_chem_indchem_ind_3350chemical_namecityclean_air_act_chem_indcountycounty_codedoc_ctrl_numfacility_nameregionrelease_estimate_amountreporting_yearstatestreet_addresstotal_releasezip
0NNETHYLENE GLYCOLLEMONTYCOOK170311.305203e+12CCI MANUFACTURING IL CORP53930.002005IL15550 CANAL BANK RD3930.0060439
1NYCHROMIUMCHICAGOYCOOK170311.305203e+12GE MATHIS CO50.002005IL6100 S OAK PARK AVE0.0060638
2NNDECABROMODIPHENYL OXIDESOUTH HOLLANDNCOOK170311.305203e+12ARMACELL LLC50.002005IL16800 S CANAL ST0.0060473
3NNCOPPEROREGONNOGLE171411.305203e+12WEC CO50.992005IL2606 RT 2 S0.9961061
4NNSULFURIC ACID (1994 AND AFTER "ACID AEROSOLS" ...BLOOMINGTONNMCLEAN171131.305203e+12MICKEY TRUCK BODIES INC510.002005IL14661 OLD COLONIAL RD10.0061704
5YYNICKELSTREATORYLASALLE170991.305203e+12STREATOR DEPENDABLE50.002005IL410 W BROADWAY AVENaN61364
6YYLEADBUSHNELLYMCDONOUGH171091.305203e+12VAUGHAN & BUSHNELL MANUFACTURING CO50.002005IL201 W MAIN STNaN61422
7NNSEC-BUTYL ALCOHOLCHANNAHONNWILL171971.305203e+12IMTT ILLINOIS - JOLIET FACILITY5110.002005IL24420 W DURKEE RD110.0060410
8YYLEADWHEELINGYCOOK170311.305203e+12ENGIS CORP50.752005IL105 W. HINTZ ROAD0.7560090
9NNN-HEXANECAIROYALEXANDER170031.305203e+12BUNGE NA INC5289616.002005IL203 34TH ST289616.0062914

Last rows

carcinogen_chem_indchem_ind_3350chemical_namecityclean_air_act_chem_indcountycounty_codedoc_ctrl_numfacility_nameregionrelease_estimate_amountreporting_yearstatestreet_addresstotal_releasezip
89904NN1-BROMOPROPANEMELROSE PARKNCOOK170311.318218e+12ENVIRO TECH INTERNATIONAL INC5750.02018IL1800 N 25TH AVENaN60160
89905YYLEADSPRING GROVEYMCHENRY171111.318217e+12SCOT FORGE CO50.12018IL8001 WINN RD0.160081
89906YYLEADLIBERTYVILLEYLAKE170971.318217e+12METALEX NORTH50.02018IL700 LIBERTY DR0.060048
89907NNSILVER COMPOUNDSELK GROVE VILLAGENCOOK170311.318217e+12PERFECTION PLATING INC51.02018IL775 MORSE AVE1.060007
89908YYNICKEL COMPOUNDSJOLIETYWILL171971.318217e+12APEX MATERIAL TECHNOLOGIES LLC50.02018IL10 INDUSTRY AVENaN60435
89909NYCHROMIUM COMPOUNDS(EXCEPT CHROMITE ORE MINED I...DECATURYMACON171151.318217e+12ADM DECATUR COMPLEX51.02018IL4666 FARIES PKWY E1.0625265666
89910YYLEADCHICAGOYCOOK170311.318218e+12UNIVERSAL ELECTRIC FOUNDRY INC50.02018IL1523 W HUBBARD STNaN60642
89911NNZINC COMPOUNDSDECATURNMACON171151.318217e+12ADM DECATUR COMPLEX50.02018IL4666 FARIES PKWY E0.0625265666
89912NNTOLUENEDECATURYMACON171151.318217e+12ADM DECATUR COMPLEX53.02018IL4666 FARIES PKWY E3.0625265666
89913NNPROPYLENECHANNAHONNWILL171971.318217e+12DIVERSIFIED CPC INTERNATIONAL INC5921.02018IL24338 W DURKEE RD921.060410